AITopics | iforest 0

Collaborating Authors

iforest 0

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Between Resolution Collapse and Variance Inflation: Weighted Conformal Anomaly Detection in Low-Data Regimes

Hennhöfer, Oliver, Preisach, Christine

arXiv.org Machine LearningMar-25-2026

Standard conformal anomaly detection provides marginal finite-sample guarantees under the assumption of exchangeability . However, real-world data often exhibit distribution shifts, necessitating a weighted conformal approach to adapt to local non-stationarity. We show that this adaptation induces a critical trade-off between the minimum attainable p-value and its stability. As importance weights localize to relevant calibration instances, the effective sample size decreases. This can render standard conformal p-values overly conservative for effective error control, while the smoothing technique used to mitigate this issue introduces conditional variance, potentially masking anomalies. We propose a continuous inference relaxation that resolves this dilemma by decoupling local adaptation from tail resolution via continuous weighted kernel density estimation. While relaxing finite-sample exactness to asymptotic validity, our method eliminates Monte Carlo variability and recovers the statistical power lost to discretization. Empirical evaluations confirm that our approach not only restores detection capabilities where discrete baselines yield zero discoveries, but outperforms standard methods in statistical power while maintaining valid marginal error control in practice.

data mining, machine learning, weightededf 0, (17 more...)

arXiv.org Machine Learning

2603.23205

Country:

Europe > Germany (0.14)
North America > United States > Wisconsin (0.05)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.70)

Add feedback

Automated Quality Control for Language Documentation: Detecting Phonotactic Inconsistencies in a Kokborok Wordlist

van Dam, Kellen Parker, Stephen, Abishek

arXiv.org Artificial IntelligenceOct-27-2025

Lexical data collection in language documentation often contains transcription errors and undocumented borrowings that can mislead linguistic analysis. We present unsupervised anomaly detection methods to identify phono-tactic inconsistencies in wordlists, applying them to a multilingual dataset of Kokborok varieties with Bangla. Using character-level and syllable-level phonotactic features, our algorithms identify potential transcription errors and borrowings. While precision and recall remain modest due to the subtle nature of these anomalies, syllable-aware features significantly outperform character-level baselines. The high-recall approach provides fieldworkers with a systematic method to flag entries requiring verification, supporting data quality improvement in low-resourced language documentation.

data mining, iforest 0, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2510.21584

Country:

Europe > Germany (0.14)
Asia > India (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Add feedback

User-Based Sequential Modeling with Transformer Encoders for Insider Threat Detection

Elbasheer, Mohamed, Akinfaderin, Adewale

arXiv.org Artificial IntelligenceJul-11-2025

Insider threat detection presents unique challenges due to the authorized status of malicious actors and the subtlety of anomalous behaviors. Existing machine learning methods often treat user activity as isolated events, thereby failing to leverage sequential dependencies in user behavior. In this study, we propose a User-Based Sequencing (UBS) methodology, transforming the CERT insider threat dataset into structured temporal sequences suitable for deep sequential modeling. We deploy a Transformer Encoder architecture to model benign user activity and employ its reconstruction errors as anomaly scores. These scores are subsequently evaluated using three unsupervised outlier detection algorithms: One-Class SVM (OCSVM), Local Outlier Factor (LOF), and Isolation Forest (iForest). Across four rigorously designed test sets, including combinations of multiple CERT dataset releases, our UBS-Transformer pipeline consistently achieves state-of-the-art performance - notably 96.61% accuracy, 99.43% recall, 96.38% F1-score, 95.00% AUROC, and exceptionally low false negative (0.0057) and false positive (0.0571) rates. Comparative analyses demonstrate that our approach substantially outperforms tabular and conventional autoencoder baselines, underscoring the efficacy of sequential user modeling and advanced anomaly detection in the insider threat domain.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2506.23446

Country: North America > United States (0.68)

Genre: Research Report > New Finding (0.88)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

TAD-Bench: A Comprehensive Benchmark for Embedding-Based Text Anomaly Detection

Cao, Yang, Yang, Sikun, Li, Chen, Xiang, Haolong, Qi, Lianyong, Liu, Bo, Li, Rongsheng, Liu, Ming

arXiv.org Artificial IntelligenceJan-21-2025

Existing studies often lack Anomaly detection is a critical task in machine systematic evaluations of how different embeddings learning, with applications ranging from fraud detection perform across diverse anomaly types, raising and content moderation to user behavior questions about their generalization capabilities analysis (Pang et al., 2021). Within natural language in complex, real-world scenarios such as multilingual processing (NLP), anomaly detection has become settings or domain-specific anomalies. Recent increasingly relevant for identifying outliers efforts, such as AD-NLP (Bejan et al., 2023) such as harmful content, phishing attempts, and and NLP-ADBench (Li et al., 2024), have significantly spam reviews. However, while AD tasks in structured advanced anomaly detection in NLP. ADdata (e.g., tabular, time series, graphs) (Steinbuss NLP provides valuable insights into different types and Böhm, 2021; Blázquez-García et al., 2021; of anomalies, while NLP-ADBench expands evaluations Qiao et al., 2024) have achieved significant maturity, to a wide range of algorithms and datasets.

data mining, detection, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2501.1196

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Oceania > Australia (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(5 more...)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (0.54)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

Testing for Outliers with Conformal p-values

Bates, Stephen, Candès, Emmanuel, Lei, Lihua, Romano, Yaniv, Sesia, Matteo

arXiv.org Machine LearningApr-19-2021

This paper studies the construction of p-values for nonparametric outlier detection, taking a multiple-testing perspective. The goal is to test whether new independent samples belong to the same distribution as a reference data set or are outliers. We propose a solution based on conformal inference, a broadly applicable framework which yields p-values that are marginally valid but mutually dependent for different test points. We prove these p-values are positively dependent and enable exact false discovery rate control, although in a relatively weak marginal sense. We then introduce a new method to compute p-values that are both valid conditionally on the training data and independent of each other for different test points; this paves the way to stronger type-I error guarantees. Our results depart from classical conformal inference as we leverage concentration inequalities rather than combinatorial arguments to establish our finite-sample guarantees. Furthermore, our techniques also yield a uniform confidence bound for the false positive rate of any outlier detection algorithm, as a function of the threshold applied to its raw statistics. Finally, the relevance of our results is demonstrated by numerical experiments on real and simulated data.

artificial intelligence, conformal p-value, health & medicine, (19 more...)

arXiv.org Machine Learning

2104.08279

Country:

North America > United States > California (0.27)
Asia > Middle East (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine (1.00)
Government (0.67)
Energy > Oil & Gas (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.86)

Add feedback